Big Five

Welcome to the website. I hope you enjoy it!

Introduction

This is an analysis of Big Five results, with raw data collected from Open Psychometrics. It contains over 1 million results from various countries, making it quite a large file to read through.

My original goal of this project was to dabble in a bit of data science – data genuinely fascinates me. So, learning how to parse, clean, and modify it, is always going to be a funny endeavor. However, I wanted to drag a bit of psychology in here, just to make it more engaging for me!

If you’re interested in a specific part, you may skip ahead:

Introduction

  1. Objectives
  2. What is the Big Five (SLOAN)
  3. Discussing Data

Data

  1. What is the Most Popular Result?
  2. Does this result vary between countries?
  3. How long did people spend on questions?

Further Analysis

  1. Limitations of Big Five
  2. Connection of Big Five and Daily Life

Objectives

After looking at the raw data, there were several questions I wanted to solve:

  1. What are the actual results of the data?
  2. Which result was the most popular? How does that compare to other data sets?
  3. Did these results vary significantly between countries?
  4. Did certain answers require more time?
  5. Why did this quiz get the results that it did?

The first three are purely data-based, as they’re just an exploration of the quiz answers. Number four, on the other hand, is trying to understand WHY the results are the way they are. Personally, I find that it’s the most interesting aspect of this: The results are cool, but understanding the people behind the quiz is far more entertaining.

However, I’m also looking to learn more about the Big Five as a personality model, with a specific interest in common trends and future applications. This means that the second half of this investigation will focus on psychology rather than data science, with a heavier emphasis on writing than code. Furthermore, the tone will remain casual throughout the investigation and I will frequently use personal pronouns and other scientific no-goes - It’s not an academic paper for a reason!

What is the Big Five (SLOAN)

The Big Five is a personality model that categories people with 5 major personality traits

Hence, a possible result could look like RLOEN, standing for reserved, limbic, organized, egocentric and non-inquisitive. This can easily be seen by looking at the beginning letter of each trait. However, an ‘X’ might also be seen. This symbolizes that they equally represent both sides. Therefore, there is no conclusive answer, resulting in a ‘X’, such as SXUAN.

The specific order and formatting of these traits uses the SLOAN formatting, and this investigation will continue using this syntax. However, other common formatting include OCEAN and CANOE. The specific wording (e.g. Extroversion, Conscientiousness) of traits may also vary from one investigation to another. This investigation will continue using the wording above.

These five major traits can also be broken down further: Extroversion can also contain traits like gregariousness and excitement-seeking. However, these subsets are not within the scope of this investigation.

Furthermore, it should be noted that one person is not entirely one or the other. People exist on a spectrum of these results, despite how black and white they seem to be portrayed. The Big Five result tends to

Interested in what your results are? The Open Psychometrics Big Five quiz can be taken here.

Citations

Discussing Data

This quiz is self-administered, with users given 50 questions, meaning 10 from each trait. Every question was ranked on a Likert scale (a 5 point scale) with 1 meaning I disagree with the prompt, 3 meaning I am neutral on the prompt, and 5 meaning I agree with the prompt.

Some sample prompts include:

Questions from each category were rotated, starting from Extroversion Q1, Agreeableness Q1, Conscientiousness Q1, Neuroticism Q1, Inquisition Q1, Extroversion Q2, etc.

Data was also collected on the country of the quiz-taker, the amount of time spent on each question, date taken, and more. In total, 1,015,342 answers were collected over ~2 years, with consent from the user.

Datalicious Depths of Delectable Data

Unfortunately, within this data set, Open Psychometrics did not provide us with the result, but with a bunch of numbers. In fact, this is what the data looks like:

I’ve only selected the first 3 columns of the first 6 responses. Not too bad, right? But, you’ve gotta imagine about 50 more columns, and about a million more rows. It’s a little more intimidating now, but we’ve gotta start somewhere >:)

First, let’s calculate the scores for the Extroversion personality trait. We can do this by summing up the scores of the 10 EXT (standing for EXTroversion) questions. Of course, it should be noted that answering a ‘4’ or ‘5’ on a EXT question doesn’t always mean that a user is more extroverted. For example, EXT Question 2: I don’t talk a lot vs EXT Question 1: I am the life of the party.

So, we’ll need to make sure to add and subtract scores as necessary - I’ve arbitrarily made the decision to add ‘points’ if a question symbolizes extroversion, while subtracting ‘points’ if a question is correlated with introversion.

Hence, after adding up all the scores, any positive score (> 0) will mean that the user receives an ‘S’ for Sociable, while a negative score (< 0) would result in an ‘R’. If their score is 0, then they’d receive an ‘X’, as their results can’t really be evaluated since they’re perfectly in-between!

Sample table of results:


Visualizing this data into a graph:

I’m going to refrain from any analysis on WHY there’s more R’s compared to S’s until the final results. But, it’s a pretty 50/50 split!

The same process will be applied to the other personality traits*, with the graphs of results located below:

After getting these values, we can now combine them together to get the results. I’m also going to limit this to the top 25 results… or else the data just gets super messy.


WOOO! Congratulations to the XXOAI family for being among the most common result out of 243 possibilities! If you’d prefer to see a more numerical visual, I’ve provided a table containing the 10 most common results below:

Results Total Amount Percentage of People
SCOAI 160758 15.832892
RLOAI 132305 13.030585
RCOAI 106355 10.474796
RLUAI 98833 9.733962
SLOAI 96998 9.553234
SLUAI 72098 7.100859
SCUAI 54528 5.370407
RCUAI 40040 3.943499
RLXAI 14370 1.415287
SXOAI 11818 1.163943

When analyzing this data, it seems.. strange that so many XXOAI types are represented. In fact, it readily contradicts ‘theory’. Using the data* from SimilarMinds, we can see how strange these results are.

x
Table 1: Theory
Results Percentage of People
SCOAI 3.4
RLOAI 2.7
RCOAI 3.5
RLUAI N/A
SLOAI 2.4
SLUAI 3.4
SCUAI 3.5
RCUAI N/A
RLXAI N/A
SXOAI N/A
x
Table 1: Experimental Results Percentage of People SCOAI 15.832892 RLOAI 13.030585 RCOAI 10.474796 RLUAI 9.733962 SLOAI 9.553234 SLUAI 7.100859 SCUAI 5.370407 RCUAI 3.943499 RLXAI 1.415287 SXOAI 1.163943